A profile entropy dependent scoring function for protein threading

نویسندگان

  • Jian Peng
  • Jinbo Xu
چکیده

Proteins play fundamental roles in all biological processes. Akin to the complete sequencing of genomes, complete descriptions of protein structures is a fundamental step towards understanding biological life, and is also highly relevant in the development of therapeutics and drugs. Computational prediction methods, especially template-based modeling, can quickly generate crude but useful structure models at a large scale. The challenge of template-based modeling lies in the recognition of correct templates and the generation of accurate sequence-template alignments. Evolutionary information (i.e., sequence profiles) has proved to be very powerful in detecting remote homologs, as demonstrated by the state-of-the-art profile-profile alignment method HHpred. However, there are still a lot of proteins without good sequence profiles. Here, we present a new protein threading method for proteins without good sequence profiles by nonlinearly combining evolutionary and non-evolutionary information. In particular, we model protein threading using a probabilistic graphical model Conditional (Markov) Random Fields (CRF) and training the model using a gradient tree boosting algorithm. The resultant threading model guides sequence-template alignment using a nonlinear scoring function consisting of a collection of regression trees. Each regression tree models a type of nonlinear relationship among different protein information. Experimental results indicate that when evolutionary information is not good enough, this new threading method greatly outperforms HHpred in terms of both alignment accuracy and fold recognition rate. The paradigm presented here for the design of a nonlinear scoring function is very general. It can also be applied to protein sequence alignment and RNA alignment.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Low-homology protein threading

MOTIVATION The challenge of template-based modeling lies in the recognition of correct templates and generation of accurate sequence-template alignments. Homologous information has proved to be very powerful in detecting remote homologs, as demonstrated by the state-of-the-art profile-based method HHpred. However, HHpred does not fare well when proteins under consideration are low-homology. A p...

متن کامل

A conditional neural fields model for protein threading

MOTIVATION Alignment errors are still the main bottleneck for current template-based protein modeling (TM) methods, including protein threading and homology modeling, especially when the sequence identity between two proteins under consideration is low (<30%). RESULTS We present a novel protein threading method, CNFpred, which achieves much more accurate sequence-template alignment by employi...

متن کامل

Advances in Protein Structure Prediction: Algorithms and Applications

The design of scoring functions (or potentials) for threading, differentiating native-like from non-native structures with a limited computational cost, is an active field of research. We revisit two widely used families of threading potentials: the pairwise and profile models. To design optimal scoring functions we use linear programming (LP). The LP protocol makes it possible to measure the d...

متن کامل

Boosting Protein Threading Accuracy

Protein threading is one of the most successful protein structure prediction methods. Most protein threading methods use a scoring function linearly combining sequence and structure features to measure the quality of a sequence-template alignment so that a dynamic programming algorithm can be used to optimize the scoring function. However, a linear scoring function cannot fully exploit interdep...

متن کامل

Protein threading using context-specific alignment potential

MOTIVATION Template-based modeling, including homology modeling and protein threading, is the most reliable method for protein 3D structure prediction. However, alignment errors and template selection are still the main bottleneck for current template-base modeling methods, especially when proteins under consideration are distantly related. RESULTS We present a novel context-specific alignmen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009